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EXECUTIVE SUMMARY 


C alls for universal preschool programs have 
become commonplace, reinforced by 
President Obama’s call for “high-quaUty 
preschool for all” in 2013. Any program that 
could cost state and federal taxpayers $50 
billion per year warrants a closer look at the evidence on its 
effectiveness. 

This report reviews the major evaluations of preschool 
programs, including both traditional programs such as 
Head Start and those designated as “high quality” These 
evaluations do not paint a generally positive picture. The 
most methodologically rigorous evaluations find that 
the academic benefits of preschool programs are quite 
modest, and these gains fade after children enter elemen- 
tary school. This is the case for Head Start, Early Head 
Start, and also for the “high-quality” Tennessee preschool 
program. Meanwhile, most contemporary “high-quality” 
preschool programs have been evaluated using a flawed, 
non-exp erimental methodology called Regression Dis- 
continuity Design (RDD). Existing RDD studies fail to 


account for children who drop out of treatment groups, 
thereby biasing outcomes upwards. Further, by their nature 
RDD evaluations cannot assess the fadeout problem be- 
cause all children in the study, both treatment and control, 
have taken preschool. These problems affect the evalua- 
tion of the Tulsa, Oklahoma, program, perhaps the most 
frequently cited contemporary “high-quality” program. 

Two “high-quality” programs have been evaluated us- 
ing a rigorous experimental design, and have been shown 
to have significant academic and social benefits, including 
long-term benefits. These are the Abecedarian and Perry 
Preschool programs. However, using these two studies 
as the basis for policy is problematic for several reasons: 
the groups studied were very small, they came from single 
communities several decades ago, and both programs 
were far more intensive than the programs being contem- 
plated today 

Before policymakers consider huge expenditures to 
expand preschool, especially by making it universal, much 
more research is needed to demonstrate true effectiveness. 


David Armor is professor emeritus of public policy, George Mason University, and author of Maximizing Intelligence , Transaction Publishers, 2003. 
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INTRODUCTION 

President Obama’s 2013 proposal of “high- 
quality preschool for all,” with $75 billion in 
federal startup money, inspired similar calls for 
universal preschool by state and local leaders 
throughout the country New York City mayor 
Bill de Blasio ran on a universal pre-K platform, 
funded by “taxing the rich,” while state gov- 
ernor Andrew Cuomo has also endorsed uni- 
versal pre-K funded from existing tax monies. 
Preschool has become a major issue in the gu- 
bernatorial races in Texas and Ohio, with both 
Democratic candidates endorsing universal 
pre-K. The California legislature is consider- 
ing a proposal spearheaded by Senate president 
Darrell Steinberg to expand its coverage of 
“transitional kindergarten” to all four-year-olds, 
thereby becoming a universal pre-K program. 

Expanding public schooling to cover all 
four-year-olds would require a significant in- 
crease in state and federal spending. Even if 
Congress authorized the full Obama proposal, 
which covers a 10 -year period, total expendi- 
tures for the states would be far higher. With 
U.S. expenditures for public schooling from 
all sources exceeding $12,000 per student, and 
with approximately 4 million students enrolled 
in public kindergartens, states could be spend- 
ing nearly $50 billion per year to fund universal 
preschool, assuming that spending levels for 
preschool are similar to those for higher grades. 

With such large proposed expenditures, 
the benefits of pre-K should be clear and defin- 
itive. Yet existing research on preschool pro- 
grams does not paint a uniformly positive pic- 
ture, despite what supporters claim. Indeed, 
the most rigorous studies of contemporary 
preschool programs, particularly the federal 
Head Start program and a Tennessee universal 
program, show no lasting gains for preschool 
students after they enter regular grades. Ac- 
cording to these studies, by the time children 
reach the early elementary grades, the average 
preschool student has learned no more than 
children who were not in preschool. 

Flow are these results reconciled with proc- 
lamations, including some from the White 


House, about the dramatic benefits of pre- 
school? Are universal preschool advocates 
simply ignoring scientific evidence? There is 
a body of research that finds educational ben- 
efits from preschool programs, but these stud- 
ies suffer from one of two problems: Either the 
preschool programs are not comparable with 
the programs being proposed today, or they use 
non-experimental designs that suffer from se- 
rious limitations, including an inability to track 
preschool effects into the early grades. 

Supporters of preschool programs have 
been very selective in their research citations. 
They extol programs with large pre-K effects 
as “high quality,” especially some historical 
programs and several recent pre-K programs 
in Tulsa, Oklahoma, and Boston, Massachu- 
setts. Meanwhile they ignore the Head Start 
findings or imply that it is not a high-quality 
program. 1 

The goal of this study is to offer a more bal- 
anced review of the major studies in this field, 
making it clear what we do and do not know 
about preschool programs, including both 
their short- and long-term effects on learning 
and other behaviors. 

HISTORICAL PROGRAMS 
ARE NOT COMPARABLE TO 
CONTEMPORARY STUDIES 

There are three historical “high-quality” 
preschool programs that have received consid- 
erable attention. These are the Abecedarian 
project in North Carolina, the Perry Preschool 
program in Michigan, and the Chicago Child- 
Parent preschool program. 2 These historical 
programs differ markedly from contemporary 
preschools in ways that make them uncompa- 
rable to current and proposed “high-quality” 
preschool programs. In addition, one of them 
has some serious methodological limitations. 

Abecedarian Program 

This program and study was conducted at 
the University of North Carolina in Chapel 
Hill during the early 1970s. It was a relatively 
well-designed randomized experiment involv- 


ing hi low-income infants, nearly all black, 57 
of whom were in the treatment group. The 
program involved interventions from infancy 
(average of four months) to the start of kinder- 
garten. A Child Development Center provid- 
ed intensive care and education for up to 40 
hours a week for 50 weeks. Follow-up studies, 
both short- and long-term, found significant 
gains for the treatment group as compared to 
the control group on various cognitive and IQ 
tests up to age 21. However, this very intensive 
and much longer intervention is not compara- 
ble to any contemporary statewide or city pre- 
school program, most of which are designed 
for a single year of pre-K. It is also worth not- 
ing that randomization was done prior to ask- 
ing mothers’ permission to participate, and a 
higher refusal rate by control group mothers 
might have produced some bias in results. 

Perry Preschool 

The Perry Preschool program and research 
took place in Ypsilanti, Michigan, in the mid- 
1960s, and involved 123 predominantly black 
children who attended the Perry Elementary 
School. It was also a randomized experiment, 
with 59 students assigned to the preschool pro- 
gram, although the research design was com- 
promised somewhat by switching two working 
mothers to the control group and by some pro- 
gram dropouts. Many years later this program 
was the subject of a rigorous cost-benefit study 
conducted by a group of economists led by 
James Fleckman, who undertook major analy- 
ses to compensate for the design flaws. They 
found that the costs of the Perry Preschool 
program— which would be about $20,000 per 
child in current dollars— were more than paid 
back by higher employment rates, lower rates 
of crime, and other economic outcomes fa- 
voring the preschool group. 3 The Fleckman 
findings generated a great deal of publicity, 
and they were cited extensively by the White 
House in support of the Obama proposal. 

While the Perry Preschool program is 
more like a contemporary preschool than the 
Abecedarian project, the Perry program con- 
sisted of two years of preschool coupled with 
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weekly visits with the parent. Because of the 
home visits, child-teacher ratios were very 
small, just five or six children per teacher, far 
lower than contemporary pre-K programs. 
Given the critical role parents play in child 
development, home visits were an impor- 
tant feature of Perry Preschool. The very low 
child-teacher ratios and the home visits make 
the Perry Preschool experience uncomparable 
to the type of preschool programs endorsed 
by President Obama and other political lead- 
ers. Therefore, the cost-benefit results for 
the Perry program cannot be extrapolated to 
most contemporary preschool programs, even 
those that are described as “high quality.” 

Chicago Child-Parent Centers 

The Chicago Child Parent Center program, 
which started in 1967, also differs in several 
ways from standard preschool programs. Like 
Perry Preschool, there is mandatory parent 
involvement during the preschool program. 
Unlike Perry and other preschool programs, it 
also has follow-on components that continue 
into the higher grades. Also unlike Perry, it has 
never been evaluated by a randomized experi- 
mental design. Instead, control group children 
have been selected from other low-income 
schools, and statistical adjustments have been 
used to create equivalence. However, self-se- 
lection bias is hard to eliminate in studies of 
this type, particularly when a program runs 
over multiple years. There is a very high like- 
lihood that treatment outcomes are biased 
upward by attrition or dropouts who do not 
complete all years of the program. 

CONTEMPORARY EXPERIMENTAL 
STUDIES 

Among contemporary evaluations of 
regular pre-K programs initiated since the 
year 2000, two stand out because both used 
randomized experimental designs. These ex- 
perimental designs are the “gold standard” for 
evaluation studies in education. Neither study 
shows long-term effects of preschool. One is 
the national Head Start Impact Study, and the 
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other is an evaluation of the statewide Tennes- 
see program. These experimental studies show 
modest short-term effects of preschool, mea- 
sured at the end of the preschool period, but 
the effects fade after children enter the regu- 
lar elementary grades. The Tennessee study is 
of particular interest because it would qualify 
as a “high-quality” preschool program in the 
same sense as the Tulsa and Boston programs. 
A third early-childhood program evaluated 
with an experimental design is Early Head 
Start, which is modeled after the Abecedarian 
project. As such, it is a much more extensive 
(and expensive) intervention than regular pre- 
K programs, but it will be discussed briefly for 
the sake of thoroughness. 

Head Start 

Perhaps the most important of these con- 
temporary studies is the Head Start Impact 
Study (HSIS), not only because it is national 
in scope, but also because it follows students 
through third grade. 4 Head Start is also the 
longest-standing preschool program in the 
country, having begun in 1965 as part of Presi- 
dent Lyndon Johnson’s War on Poverty. In 
2012 it served nearly one million students for 
an annual cost of about $8 billion. 

The Impact Study started with a cohort of 
over 4,500 children who were 3 or 4 years old 
in the Fall of 20 02. 5 These students applied for 
admission to a random sample of more than 
300 Plead Start centers in 23 states. About 
2,600 children were randomly assigned to the 
Head Start condition, while about 1,800 stu- 
dents were assigned to a control group. Some 
control group students found other types of 
preschools, and some students assigned to the 
Head Start condition did not attend. Special 
techniques were used to assess how much these 
“misclassifications” affected the overall results. 

The HSIS found statistically significant ef- 
fects of Head Start for both reading and math 
skills during the preschool years. However, the 
effects were modest. A school year constitutes 
about 10 months of learning, and Plead Start 
students gained about 2 months more than the 
control group. There were also some significant 


effects for social behaviors, particularly parents 
showing more awareness of health care issues. 

Unfortunately, these positive effects did 
not last beyond the kindergarten year, and 
there were also no effects on reading and math 
skills at the end of first or third grades. Like- 
wise, there were no major effects on prob- 
lem behaviors lasting into kindergarten, first 
grade, or third grade. 

The HSIS has been criticized because some 
Head Start students did not finish the Head 
Start year, and some control group students 
sought out and entered some other kind of 
preschool. 6 The original study defended this as 
a realistic condition, and in any event it would 
be unethical to prevent the parents of control 
group children from seeking out other pre- 
K programs. Nonetheless, the effect of Head 
Start might be reduced by these “crossovers,” 
since some control group children also had 
preschool. Various statistical analyses were un- 
dertaken in the original study to deal with this 
issue, and a special analysis was undertaken by 
Peter M. Bernardy 7 The results of the Bernardy 
analysis are shown in Figure 1. In this re-analysis 
of the HSIS data, none of the control group had 
any formal preschool, and all of the Head Start 
students attended the Head Start program for 
two years. This approach intends to assess the 
“full” effect of Head Start as compared to chil- 
dren with no preschool experience. 

In this special analysis, the Head Start ef- 
fect at the end of the first pre-Kyear is 7 points 
and is statistically significant, representing a 
3.5-month advantage for Head Start students 
over the control (who gained o in this analy- 
sis). That advantage shrinks to 5 points in the 
second year of Head Start (about a 2-month 
advantage) and is still statistically significant. 
But by the end of kindergarten both Head 
Start and control students score the same, and 
by the end of first grade the control students 
are scoring 2 points higher than Head Start 
students, although this is not a statistically sig- 
nificant difference. 

Note the very large academic skill gains 
made each year by both groups of students 
during kindergarten and first grade years. The 
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Figure 1 

Academic Skill Gains for Head Start vs. Control 
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Source: Peter M. Bernardy, “Head Start: Assessing Common Explanations for the Apparent Disappearance of Initial Positive 
Effects” (PhD diss., George Mason University, 2012). 


7-point gain for three-year-old Head Start stu- 
dents is dwarfed by the gains for both groups 
during the regular school years, which are 
about 40 points each year. One of the reasons 
it is very difficult to realize lasting effects of 
a preschool program is that children in early 
elementary school are in their major develop- 
ment years, and they are learning four times as 
much material during a regular school year as 
they do in the preschool years, when children 
are just starting to learn more words, manipu- 
late numbers, and so forth. 

Tennessee Pre-K 

The Voluntary Pre-K for Tennessee Initia- 
tive started in 2005, and by 2007 it had grown 
to more than 18,000 participants. 8 The pro- 
gram gives priority to high-risk students, as 
determined by poverty status (first priority), 
children with disabilities, and children with 
limited English proficiency. Like most other 
state programs, such as those in Tulsa, Okla- 
homa, and Boston, Massachusetts, preschool 
teachers must be licensed teachers with pre-K 
certification, and most classes have teaching 


assistants. It is a full-day program, requiring 
a minimum of 5.5 hours of instruction for five 
days a week. As such, this program merits the 
“high quality” label applied to the Tulsa and 
Boston studies. 

Unlike the case in the Tulsa and Boston 
preschool studies, the evaluation of the Ten- 
nessee state preschool program utilized an 
experimental design whereby 3,000 children 
were randomly assigned to treatment and con- 
trol conditions in the 2009-10 and 2010-n 
school years. A subgroup of 1,100 children was 
randomly selected for an “intensive sub-study” 
involving additional testing and data collec- 
tion. This study had one major design flaw, in 
that parental permission for data collection 
was obtained after the random assignment was 
made, and the control group had lower partici- 
pation rates than the treatment group, render- 
ing this no longer a randomized controlled 
trial but instead only a quasi-experiment. Pro- 
pensity analysis was used to generate equiva- 
lent treatment and control groups. 

During the initial evaluation, significant 
preschool effects were found for academic 
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achievement, and the magnitudes were mod- 
est and similar to effects found of the Head 
Start Impact Study For example, the compos- 
ite academic scale showed an increase of about 
2.5 months of learning compared to the con- 
trol group. At the kindergarten and first grade 
follow-ups, however, the impacts had dimin- 
ished. In the words of the investigators, 

the effects of {Tennessee Voluntary 
Pre-K] on the . . . achievement mea- 
sures observed at the end of the pre-k 
year had greatly diminished by the end 
of the kindergarten year and the differ- 
ences between participants and non- 
participants were no longer statistically 
significant. . . . Similarly, at the end of 
first grade, there were no statistically 
significant differences between TN- 
VPK participants and nonparticipants 
on the {Woodcock-Johnson} measures 
with one exception . . . that favored the 
nonparticipant group. 9 

Similarly, at the end of first grade there 
were no significant differences on a series of 
behavioral outcome as rated by teachers, in- 
cluding social skills, work-related skills, peer 
relations, and behavioral problems. In the in- 
tensely studied subgroup there was, however, 
a somewhat higher rate of retention in kinder- 
garten for the control group, 6 percent versus 
4 percent (it was 8 percent versus 4 percent for 
the full 2009-10 cohort). 

In contrast to the non-experimental de- 
signs used for the Tulsa and Boston evalu- 
ations, the Tennessee randomized (quasi-) 
experiment had results very similar to those 
of the national Head Start evaluation. There 
were significant benefits during the preschool 
year, but the effects faded when students tran- 
sitioned to the regular school grades, during 
which time nonpreschool children caught up. 

Early Head Start 

Early Head Start is a program designed for 
pregnant women and infants, providing in- 
home and center-based education and health 


services for both the child and the parent(s), 
extending from the prenatal period until age 
three. 10 The experimental evaluation of Early 
Head Start involved 3,000 children and their 
families who entered the program between 
1996 and 1998 and were randomly assigned to 
a control group and an Early Head Start treat- 
ment group. They were followed up at age 
three, age five, and then at grade five. The pro- 
gram is significantly more costly than regular 
Head Start; one study estimated annual costs 
in 2009 to be more than $11,000 per child/ 
family per year, so that the full three years 
would cost well over $30,000 per child. 11 

The outcomes at fifth grade were not very 
impressive, particularly given the cost of this 
program. Out of a total of more than 56 aca- 
demic, socio-emotional, parenting, behav- 
ioral, and mental health outcomes, only two 
were statistically significant and favored Early 
Head Start: a socio-emotional success index 
and an anxiety/depression rating. The sizes of 
the impacts were quite small, being just one- 
tenth of a standard deviation. There were no 
overall effects on cognitive performance (a 
major objective of all preschool education 
programs), good parenting behaviors, or re- 
duced negative behaviors such as aggression 
and rule-breaking. 

There were significant positive impacts on 
several outcomes for the African American 
subgroup, mainly in the socio-emotional do- 
main and improved parenting behaviors (e.g., 
involvement in school and less alcohol use). 
There were no significant impacts on African 
American cognitive outcomes, which is disap- 
pointing to those who hoped that earlier in- 
tervention might help close the achievement 
gap. Perhaps most disappointing was that the 
highest-risk subgroup (including all races)— 
the children of single mothers, on welfare, 
unemployed, and lacking a high school diplo- 
ma— had no positive outcomes and numerous 
significant negative impacts, especially in the 
cognitive domain. 

In short, there is little evidence that, even 
if policymakers are willing to spend the money, 
Early Head Start is the model that will pro- 


duce substantial and consistent improvement 
in social and academic behavior for disadvan- 
taged children. 

PROBLEMS WITH 
CONTEMPORARY 
NON-EXPERIMENTAL STUDIES 

Several contemporary preschool programs 
have been evaluated over the past io years 
and have been cited in many press accounts 
as evidence of the success of “high-quality” 
universal preschool programs. These include 
state-sponsored preschool programs involving 
multiple schools in New Jersey and Georgia, 
and single-city programs in Tulsa, Oklahoma, 
and Boston, Massachusetts . 12 An evaluation of 
statewide universal programs in five states also 
used the same methodology. 

None of these studies used experimen- 
tal designs, meaning randomized assignment 
of children to preschool and nonpreschool 
conditions. Instead, all of these studies used 
“regression discontinuity designs,” or RDD. 
The treatment group consisted of children 
just starting kindergarten who completed 
preschool the year before, while the control 
group consisted of children just starting pre- 
school; testing was done at the beginning of 
the school year for both groups. The design 
required a strict birth month cutoff (usually 
September i) so that children who were four 
years old as of that date were eligible for pre- 
school, while younger children had to wait 
until the following year. RDD also assumes 
that these two groups are identical except for 
their age and the fact that one has completed 
preschool while the other has not. The stud- 
ies used regression analysis to control for the 
age difference, since age is strongly correlated 
with test scores. 

The Institute for Education Sciences in 
the U.S. Department of Education publishes 
standards for RDD studies that must be met 
in order to make a valid causal inference about 
treatment effects . 13 At least one of these stan- 
dards, and perhaps two, have been violated by 
the published RDD studies ofpreschool. First, 
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if there is any attrition from either group, it 
must be recorded and reported. Adjustments 
can be made if effects are small, otherwise val- 
id inferences cannot be drawn. Attrition was 
not reported for either the Tulsa or the New 
Jersey RDD studies; it was reported for the 
Boston studies, but the level of bias was not 
evaluated as required by the standards. 

The attrition problem is particularly impor- 
tant for RDD studies because attrition occurs 
only for the treatment group. Children who 
start preschool but who do not complete the 
program are no longer in the treatment group 
and are therefore not tested at the beginning 
of kindergarten. Since the control group chil- 
dren are just starting preschool, there can be 
no attrition for the control group. Program 
dropouts, whether due to mobility or difficulty 
with the preschool program content, are likely 
to have lower test scores than those who re- 
main in the program, and therefore scores of 
the treated group can be biased upward by an 
unknown amount. Moreover, there are no pre- 
treatment scores for the outcome measures, 
which could be used to make adjustments if at- 
trition is not too large. 

This problem is illustrated using hypotheti- 
cal data in Figure 2. The solid regression lines 
for the treatment and control groups show a 
slight negative effect for treatment (about mi- 
nus 1 point on the test score). The dotted line 
shows the treatment regression line after re- 
moving test scores for five dropouts (about 12 
percent), which now indicates a statistically 
significant positive effect of about three points. 

The second requirement for a reliable re- 
gression discontinuity study is that the “forc- 
ing” variable used to assign children to treat- 
ment and control groups, in this case the 
cut-off age, must not be confounded with any 
other characteristic that might cause different 
behaviors in the treatment and control group. 
It is likely that the age variable affects many 
other variables, such as the friends a child 
plays with, the amount of time spent in out- 
of-home care, and activities with parents. As 
a result, age is a problematic choice of forcing 
variable for the study of pre-school effects. 
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Figure 2 

Hypothetical Example of RDD Method 
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Age in Months 


Source: Author’s calculations using hypothetical data. 


A third problem, which is not mentioned in 
published standards, is the timing of skill test- 
ing. There is no problem if all children are test- 
ed on the very first day of schooling or within a 
few days of the start of school. This may not be 
realistic, however, since testing can take up to 
two months or more to complete. The prob- 
lem arises if the control group testing is com- 
pleted in a shorter interval than the treatment 
group, which is possible given the different 
types of classroom settings. Since cognitive 
growth rates are much higher in kindergarten 
than preschool (see Figure i), this could result 
in an upward bias of treatment effects. There 
might be ways to ameliorate this problem, but 
existing studies do not report on the timing or 
duration of testing for the two groups. 

A fourth problem, and perhaps the most im- 
portant for preschool studies, is that an RDD 
study can only assess the effect of preschool for 
the year of treatment; it cannot compare the 
treatment and control group for the fadeout 
problem in later grades. Unlike an experimen- 


tal design, both the treatment and the control 
group will have had preschool, and thus there 
is no “identical” group that has not had pre- 
school. The long-term effect of preschool is 
the critical policy issue, because there is little 
educational value for preschool effects that oc- 
cur during the preschool year but that do not 
extend into the later grades, as when a ist- or 
2nd-grade student who has had preschool is 
compared to a student who has not. 14 

Because of these problems, none of the 
studies discussed below provide a rigorous 
assessment of either the short- or long-term 
effects of preschool. They are reviewed here 
because many commentators (including the 
studies’ authors) consider them to demon- 
strate the efficacy of high-quality programs. 

The Abbot Preschool Program: 

The New Jersey Abbot Preschool program 
is offered in 31 predominantly low-income 
school districts and serves more than 40,000 
three- and four-year-old children, which con- 


stitutes approximately 80 percent of the eli- 
gible children in those districts. The program 
was implemented in 1999 as a result of a court 
order. A major RDD evaluation was carried 
out for approximately 1,500 children, half of 
whom attended kindergarten in 2005-06 (the 
treatment group) and half of whom attended 
preschool (the control group). No informa- 
tion was provided on the number of children 
who had started preschool the year before 
but had not completed the program, nor was 
information available about when the testing 
was completed for the two groups. This infor- 
mation may not be so important for the Ab- 
bot program, however, because the effects of 
preschool were only slightly larger than those 
found in the national Head Start program 
(about 3 months of learning for verbal skills 
and 3.5 months for math). 15 

The Tulsa Program 

Oklahoma established a universal, volun- 
tary, pre-K program for four-year-olds in 1998, 
and at the time of the Tulsa study about 63 
percent of its four-year-old population was en- 
rolled in the state program; another 10 percent 
or so was enrolled in Head Start. This gives 
Oklahoma the highest rate of pre-K enroll- 
ment in the nation; Georgia is a close second 
with a total of 63 percent total enrolled. The 
Tulsa study reviewed here was an RDD evalu- 
ation for approximately 3,000 students en- 
rolled in pre-K or kindergarten in September 
of 2003. 16 

No detailed information is provided about 
how long the testing took, except for a state- 
ment that “testing took place, for the most 
part, during the first week of school.” No in- 
formation is available, either, about the num- 
ber of students who started pre-K but dropped 
out before completing the year and thus are 
not counted as part of the treatment group. 
Some tabulations suggest that this might be 
a major issue. For example, the pre-K control 
group was 18 percent Hispanic, and 26 percent 
of mothers were high school dropouts, while 
the kindergarten treatment group was n per- 
cent Hispanic, and only 16 percent of moth- 
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ers were high school dropouts. Both of these 
characteristics are strongly correlated with 
test scores. These differences could have been 
created by children from these backgrounds 
being more likely to drop out of the pre-K pro- 
gram. While these characteristics were sta- 
tistically taken into account in the regression 
analysis, the differences suggest the treated 
and control groups were not “identical” and 
could differ on unmeasured (and therefore un- 
controlled) characteristics, especially motiva- 
tion and initial cognitive skills. 

The impact of the Tulsa preschool program 
on verbal skills for this cohort of children was 
8 months, or almost a full year of schooling; a 
later evaluation reported an effect of one full 
year. This is truly a staggering effect; it is un- 
precedented for any type of school program. It 
is four times the typical Head Start effect, and 
greater than any other “high-quality” program, 
including the Abecedarian and Perry Preschool 
programs, both of which were far more inten- 
sive. It is likely that this effect is an artifact of 
the RDD method, especially a high dropout 
rate from the treatment group. This conclusion 
is reinforced by another study of Oklahoma 
children that did not produce such extraordi- 
nary results (see the five-state study below). 

The Boston Program 

The Boston preschool program differs to 
some extent because it was not a statewide 
program; rather, it was developed by the Bos- 
ton Schools working with researchers at the 
Harvard Graduate School of Education. It 
was initiated in 2005 but was not fully imple- 
mented and operational for several years. The 
evaluation took place for students who at- 
tended preschool in 2008-09; testing com- 
menced the following year (September 2009) 
when the treatment group (969 children) was 
starting kindergarten and the control group 
(1049 children) was starting preschool. At that 
time about one-third of eligible four-year-olds 
attended preschool. 

The study reported a sizable attrition (drop- 
out) rate for the treatment group of about 22 
percent and stated that the dropouts differed 
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significantly from the completion group. Al- 
though adjustments were made using available 
administrative records, the most important in- 
formation needed for such adjustments, such 
as pre-treatment cognitive skills and motiva- 
tion, is not available in a RDD design. This can 
introduce a significant bias in the results. 

Equally important, although children be- 
gan school in mid-September, testing did not 
begin until the end of September, and only 
one-third of the students had been tested 
by the end of October. It took until the end 
of November— more than half way through 
the semester— before most of the testing was 
completed. This can introduce another source 
of bias if the treatment group finished testing 
later than the control group. 

The size of the preschool effects on ver- 
bal skills were smaller than the Tulsa studies 
but were still very large, about six months of 
growth or three times larger than found for 
the national Head Start evaluation. Given that 
these results equaled the effects found for the 
Abecedarian Project after five years of pre- 
school, and given the reported dropout rate, it 
is likely that the treatment and control groups 
were not equivalent. 

Study of Five States 

A special RDD study was undertaken for 
universal preschool programs in five states: 
Michigan, New Jersey, Oklahoma, South Car- 
olina, and West Virginia. 17 Random samples 
were drawn for preschool classrooms, and ran- 
dom samples of students were drawn from the 
sampled classes. Kindergarten classes were 
then selected from the same school as the 
sampled preschool classroom. Children were 
tested in the early fall of the 2004-05 school 
year. Sample sizes ranged from 720 to 870 for 
all states except Newjersey, where the sample 
was about 2,000. Treatment and control sam- 
ples were each about half of the total samples. 

In the study no information was provided 
about attrition or dropout from the treatment 
groups, and there was no detailed information 
about the time of testing; we know only that it 
was done in “early fall.” Attrition and time of 


testing might be less important for this study, 
however, because the effects were not large for 
most states. Significant effects for verbal skills 
were only found for New Jersey and Okla- 
homa, and the magnitudes were only mod- 
estly higher than Head Start results (3.5 and 3 
months, respectively). The effect for the Okla- 
homa sample is much smaller than that found 
in the Tulsa study. The Oklahoma sample did 
not show a significant effect for math, but 
the Michigan sample produced a very strong 
effect for math, nearly 5 months. This is puz- 
zling, because Michigan also showed a small 
negative effect on reading. No significant ef- 
fects were found for verbal or math skills for 
South Carolina and West Virginia. 

The Georgia Study 

The Georgia study is the most recent RDD 
study to be published. 18 Data for the evaluation 
were collected in the fall of 2012 for about 1,200 
children, 611 of whom had competed pre-K the 
previous school year (treatment group) and the 
remaining children (control group) who were 
just entering pre-K. The children were recruit- 
ed from a random sample of 90 pre-K class- 
rooms from the nearly 4,000 pre-K classrooms 
operating in Georgia as of August 2011. 

No systematic information is provided for 
treatment-group attrition, or children who 
started preschool but dropped out before fin- 
ishing. The treatment and control groups were 
similar on most demographic characteristics 
except race and English language proficiency 
(based on the pre LAS test). The control group 
had a somewhat higher proportion of black 
students (just 4 percentage points), but a much 
lower rate of English-language proficiency. 

Nearly three-fourths of the treatment 
group were fully fluent in English compared 
to only 46 percent of the control group, and 
only 8 percent of the treatment group were 
limited or non-English speakers compared to 
26 percent of the control group. This result 
renders the groups non-equivalent on a critical 
skill, and it is evidence that nonfluent English 
speakers were less likely to finish the pre-K 
year. Although the study controlled for Eng- 


lish fluency in the regression analysis, the non- 
equivalence on this characteristic raises the 
possibility of non-equivalence on other char- 
acteristics including cognitive skills in general. 

Another concern is the timing of assess- 
ment. Although Georgia public schools start 
in early August, the report states that assess- 
ment took place between September 21 and 
December 20, near the end of the first semes- 
ter. If assessment took longer for the treat- 
ment group in kindergarten, this could intro- 
duce another factor that would lead to higher 
test scores for the treatment group. 

The effects of the Georgia program resem- 
ble those of the Tulsa study, with an effect on 
verbal skills of a full school year, and an effect 
of a half-year on math skills. As with the Tulsa 
study, these extraordinarily large effects likely 
reflect an upward bias due to non-equivalent 
treatment and control groups arising from the 
RDD technique. 

A COMMENT ON META-ANALYSES 
OF PRESCHOOL PROGRAMS 

In addition to the specific studies reviewed 
here, there are several meta analyses of pre- 
school evaluations that combine results from 
many individual programs and derive over- 
all estimates of preschool effects. Generally 
these studies place less emphasis on study de- 
signs and more on arriving at an average effect 
size attributable to preschools programs. 

One of the most comprehensive of these 
reviews examined 84 individual studies includ- 
ing all of the studies listed above. 19 It arrived at 
an overall estimate of about .2 standard devia- 
tions during the program year after adjusting 
for sample sizes. This is about 2 months of a 
normal school year during the early elemen- 
tary years. This effect size is very similar to 
the average Head Start effect as estimated in 
the HSIS. The review also found that nearly 
all follow-up studies showed that the effect 
faded out after children started regular school. 
In other words, this meta-analysis produced 
average effects that were no larger than Head 
Start and tended to fade out in grade school. 
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THE QUALITY ISSUE: 

WHAT CONSTITUTES 
HIGH-QUALITY PRESCHOOL? 

Many discussions of preschool distinguish 
“high-quality” preschools from others that 
are presumably not high quality The classic 
Abecedarian and Perry Preschool programs, 
as well as the contemporary Tulsa and Bos- 
ton preschool programs, have been described 
as high quality. According to the office that 
oversees the Tennessee preschool program, it 
is a “national leader in pre-K quality, achiev- 
ing 9 out of 10 quality standard benchmarks 
of the National Institute for Early Education 
Research (NIEER).” 20 It is generally under- 
stood that, because it is a national program 
with thousands of pre-K programs, Head Start 
classrooms vary in quality 

What is high quality? At least one educa- 
tor has characterized quality as one of the 
most confusing issues when evaluating pre- 
K programs. 21 Although there are many po- 
tential dimensions to the quality debate, the 
two most important factors cited by many 
researchers are teacher education and pre-K 
classroom quality The state programs that 
have incorporated pre-K into regular public 
elementary schools generally have teachers 
with BA degrees, and many with pre-K certifi- 
cates, because that is required for public school 
teachers. In contrast, the national Head Start 
program does not require BA degrees, and in 
the HSIS only about 30 percent of Plead Start 
teachers had BA degrees while another 30 per- 
cent had AA degrees. 

The NIEER describes a preschool class- 
room quality rating system called ECERS-R, 
which rates preschool classrooms on seven 
different dimensions of quality including the 
classroom space and furnishings, instructional 
activities, teacher-child interactions, mate- 
rials, and relations with parents. The rating 
scale ranges from 1 (poor) to 7 (excellent), and a 
score of 5 is considered “good.” The rating sys- 
tem has been used for many years to rate the 
quality of preschool programs, including Head 
Start programs. 
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Classroom ratings using this instrument 
are available for Head Start, the New Jersey 
Abbot Program, and Boston. In the 2007 New 
Jersey Study, the average classroom quality rat- 
ing was 4.8, and 44 percent of classrooms were 
rated “good” or better. In Boston, 48 percent 
of classrooms were rated “good” or better in 
2007, the year before the formal pre-K evalu- 
ation started. 

For the Head Start programs in the nation- 
al Impact Study, the average classroom quality 
ratings were actually higher than for the New 
Jersey and Boston programs. The average qual- 
ity was 5.2, and 70 percent of the classrooms 
were rated “good” or better. In other words, 
Head Start scored higher in quality than the 
reportedly better performing Abbott and Bos- 
ton programs. 

The more important question is whether 
teacher education and assessed classroom 
quality make a significant difference in a pre- 
schooler’s learning. Most middle class parents 
do a pretty good job in preparing their chil- 
dren for school without a formal curriculum; 
there are lots of materials available to help. 
Moreover, for 3- and 4-year-olds there is no 
clear reason why a BA degree is necessary for 
promoting growth in basic skills, as might be 
the case for older children. 

This commonsense reasoning is born out 
by formal research. For example, a special re- 
view of several national and state databases 
on preschool programs did not find a signifi- 
cant relationship between teachers having 
a BA degree and various learning outcomes 
for preschoolers. 22 Equally important, a new 
study using national long-term data on infants 
and children found no relationship between 
quality of preschool classrooms on learning 
or social outcomes once a child’s demographic 
and family characteristics were taken into ac- 
count. 23 

CONCLUSIONS AND POLICY 
IMPLICATIONS 

This review of preschool research leads to 
several conclusions. First, the historical proj- 


ects like Abecedarian or Perry Preschool pro- 
grams involve much more intensive interven- 
tions than contemporary preschool programs. 
Moreover, they each tested a very small sample 
drawn from a single community. As such, their 
results should not be generalized to current 
policies and should not be used to make infer- 
ences about the types of preschool programs 
being operated today. 

Second, although preschool programs eval- 
uated by the most rigorous research designs 
show modest but statistically significant im- 
provements during the preschool years, these 
gains fade as children move into the kindergar- 
ten and first grades. The fadeout might be more 
accurately described as “catch up,” because the 
cognitive growth that occurs for all children in 
the early elementary grades is far greater than 
the gains during the preschool years, so it may 
be that children who did not have preschool 
simply caught up with those who did. 

Third, all of the studies finding very large 
pre-K effects, particularly those in Tulsa, Bos- 
ton, and Georgia, used regression discontinui- 
ty designs. These are not experimental designs, 
and their validity depends on demonstrating 
that the treatment and control groups— those 
having completed pre-K and those just start- 
ing pre-K, respectively— are close to identical 
but for the treatment. That evidence is miss- 
ing in several respects, particularly attrition 
from the treatment group and time of testing. 
Most important, RDD studies cannot con- 
duct longer-term follow ups because both the 
treatment and control groups have had pre- 
school. 

While the Perry Preschool results might 
not generalize to present-day pre-K programs, 
it is worth mentioning a viewpoint articulated 
by James Pleckman and his colleagues about 
the potential role of preschool in changing 
future behavior. Their research leads to a so- 
phisticated model that posits benefits of early 
intervention programs (and also adolescent 
intervention) leading to desirable adult be- 
haviors, particularly high-school graduation, 
employment, and reduced criminality 24 This 
suggests a need for more, longer-term studies 


of children from preschool programs to deter- 
mine adult behavioral changes. 

Before considering expanding preschool 
offerings— especially making it universal- 
policymakers need to seek more randomized 
trials that track control and treatment groups 
over several years, and they should only at- 
tempt to replicate programs that have statis- 
tically significant, lasting effects, which can 
be achieved at scale and an affordable price. 
The original logic of helping disadvantaged 
children catch up may still be valid, because 
Head Start-type programs do show modest 
benefits during the preschool year. However, 
the proposal to expand preschool to every- 
one defeats the purpose of closing achieve- 
ment gaps by giving disadvantaged children a 
“head start.” More importantly, the evidence 
as it currently exists demonstrates only short- 
term skill gains that fade after a few years, and 
there is insufficient evidence— two small, old, 
very intensive programs— for the type of long- 
term behavioral changes envisioned by the 
Heckman group. 

Pre-K education may help, but the research 
to date does not support expanding existing 
government programs. New preschool pro- 
grams should not be introduced unless they 
have statistically significant, non-negligible 
benefits. 
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